Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Database for Arabic Printed Character Recognition

Identifieur interne : 000C96 ( Main/Exploration ); précédent : 000C95; suivant : 000C97

A Database for Arabic Printed Character Recognition

Auteurs : Ashraf Abdelraouf [Royaume-Uni, Égypte] ; A. Higgins [Royaume-Uni] ; Mahmoud Khalil [Égypte]

Source :

RBID : ISTEX:3316A075D231D4BF55FA989BE58CF8E3FD9C8520

Abstract

Abstract: Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabic-speaking countries. In developing an OCR system for Arabic it is necessary to create a database of Arabic words. Such a database has many uses as well as in training and testing a recognition system. This paper provides a comprehensive study and analysis of Arabic words and explains how such a database was constructed. Unlike earlier studies, this paper describes a database developed using a large number of collected Arabic words (6 million). It also considers connected segments or Pieces of Arabic Words (PAWs) as well as Naked Pieces of Arabic Word (NPAWs); PAWS without diacritics. Background information concerning the Arabic language is also presented.

Url:
DOI: 10.1007/978-3-540-69812-8_56


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A Database for Arabic Printed Character Recognition</title>
<author>
<name sortKey="Abdelraouf, Ashraf" sort="Abdelraouf, Ashraf" uniqKey="Abdelraouf A" first="Ashraf" last="Abdelraouf">Ashraf Abdelraouf</name>
</author>
<author>
<name sortKey="Higgins, A" sort="Higgins, A" uniqKey="Higgins A" first="A" last="Higgins">A. Higgins</name>
</author>
<author>
<name sortKey="Khalil, Mahmoud" sort="Khalil, Mahmoud" uniqKey="Khalil M" first="Mahmoud" last="Khalil">Mahmoud Khalil</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3316A075D231D4BF55FA989BE58CF8E3FD9C8520</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-69812-8_56</idno>
<idno type="url">https://api.istex.fr/document/3316A075D231D4BF55FA989BE58CF8E3FD9C8520/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000443</idno>
<idno type="wicri:Area/Istex/Curation">000436</idno>
<idno type="wicri:Area/Istex/Checkpoint">000763</idno>
<idno type="wicri:doubleKey">0302-9743:2008:Abdelraouf A:a:database:for</idno>
<idno type="wicri:Area/Main/Merge">000D08</idno>
<idno type="wicri:Area/Main/Curation">000C96</idno>
<idno type="wicri:Area/Main/Exploration">000C96</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A Database for Arabic Printed Character Recognition</title>
<author>
<name sortKey="Abdelraouf, Ashraf" sort="Abdelraouf, Ashraf" uniqKey="Abdelraouf A" first="Ashraf" last="Abdelraouf">Ashraf Abdelraouf</name>
<affiliation wicri:level="4">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>School of Computer Science, The University of Nottingham, Nottingham</wicri:regionArea>
<orgName type="university">Université de Nottingham</orgName>
<placeName>
<settlement type="city">Nottingham</settlement>
<region type="nation">Angleterre</region>
<region type="région" nuts="1">Nottinghamshire</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computer Science, Misr International University, Cairo</wicri:regionArea>
<wicri:noRegion>Cairo</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
<author>
<name sortKey="Higgins, A" sort="Higgins, A" uniqKey="Higgins A" first="A" last="Higgins">A. Higgins</name>
<affiliation wicri:level="4">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>School of Computer Science, The University of Nottingham, Nottingham</wicri:regionArea>
<orgName type="university">Université de Nottingham</orgName>
<placeName>
<settlement type="city">Nottingham</settlement>
<region type="nation">Angleterre</region>
<region type="région" nuts="1">Nottinghamshire</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
<author>
<name sortKey="Khalil, Mahmoud" sort="Khalil, Mahmoud" uniqKey="Khalil M" first="Mahmoud" last="Khalil">Mahmoud Khalil</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Engineering, Ain Shams University, Cairo</wicri:regionArea>
<wicri:noRegion>Cairo</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: khalil_mik@yahoo.com</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">3316A075D231D4BF55FA989BE58CF8E3FD9C8520</idno>
<idno type="DOI">10.1007/978-3-540-69812-8_56</idno>
<idno type="ChapterID">56</idno>
<idno type="ChapterID">Chap56</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabic-speaking countries. In developing an OCR system for Arabic it is necessary to create a database of Arabic words. Such a database has many uses as well as in training and testing a recognition system. This paper provides a comprehensive study and analysis of Arabic words and explains how such a database was constructed. Unlike earlier studies, this paper describes a database developed using a large number of collected Arabic words (6 million). It also considers connected segments or Pieces of Arabic Words (PAWs) as well as Naked Pieces of Arabic Word (NPAWs); PAWS without diacritics. Background information concerning the Arabic language is also presented.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
<li>Égypte</li>
</country>
<region>
<li>Angleterre</li>
<li>Nottinghamshire</li>
</region>
<settlement>
<li>Nottingham</li>
</settlement>
<orgName>
<li>Université de Nottingham</li>
</orgName>
</list>
<tree>
<country name="Royaume-Uni">
<region name="Angleterre">
<name sortKey="Abdelraouf, Ashraf" sort="Abdelraouf, Ashraf" uniqKey="Abdelraouf A" first="Ashraf" last="Abdelraouf">Ashraf Abdelraouf</name>
</region>
<name sortKey="Abdelraouf, Ashraf" sort="Abdelraouf, Ashraf" uniqKey="Abdelraouf A" first="Ashraf" last="Abdelraouf">Ashraf Abdelraouf</name>
<name sortKey="Higgins, A" sort="Higgins, A" uniqKey="Higgins A" first="A" last="Higgins">A. Higgins</name>
<name sortKey="Higgins, A" sort="Higgins, A" uniqKey="Higgins A" first="A" last="Higgins">A. Higgins</name>
</country>
<country name="Égypte">
<noRegion>
<name sortKey="Abdelraouf, Ashraf" sort="Abdelraouf, Ashraf" uniqKey="Abdelraouf A" first="Ashraf" last="Abdelraouf">Ashraf Abdelraouf</name>
</noRegion>
<name sortKey="Khalil, Mahmoud" sort="Khalil, Mahmoud" uniqKey="Khalil M" first="Mahmoud" last="Khalil">Mahmoud Khalil</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C96 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000C96 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:3316A075D231D4BF55FA989BE58CF8E3FD9C8520
   |texte=   A Database for Arabic Printed Character Recognition
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024